Translation at the Carpentries:

Technology past, present, and future



Joel Nitta

https://joelnitta.github.io/carpentrycon_2022_trans

Carpentries logo

How to translate?

  • Not as simple as just re-writing text in another language as if translating a novel

  • Carpentries lessons are technical documents (rendered using software) and therefore present unique challenges

Symbol of letter A translated to Japanese hiragana

Challenges in technical translation

  • Need to be able to
    • update translation when original changes
    • deal with source code vs. rendered version
  • Most solutions for translating source code (gettext and PO files) are designed with software in mind, not prose text

PO files

Translation workflow using PO files

What is technical translation anyways?

Two aspects:

  • internationalization (i18n): Providing the framework to support translation (requires technical knowledge)

  • localization (l10n): Actually translating strings (requires linguistic knowlege)

Globes with different continents at center

Past approach (“Styles” format)

  • The current Carpentries lesson format is called the “Styles” format

Programming With Python built with the Styles template on 2022-01-27

Past approach (“Styles” format)

  • The current Carpentries lesson format is called the “Styles” format

  • The Styles format is based on Jekyll (and some other tools)

  • Translation system* designed by David Pérez-Suárez used a tool called PO4gitbook

PO4gitbook

  • All translations controlled from a central repo with submodules for each lesson (can track changes)

  • Rendering not possible by typical translator; required David PS to update each time

  • No standard system for how to localize

    • transifex (cloud-based)
    • POedit (local text editor)
    • github (online code review; used by JA community)

Screenshot of https://github.com/carpentries-i18n/i18n

New approach (“Workbench” format)

  • The upcoming Carpentries lesson format is the “Workbench” format

Programming With Python built with the Carpentries Workbench on 2022-01-27

New approach (“Workbench” format)

  • The upcoming Carpentries lesson format is the “Workbench” format

  • The Workbench format is based on Rmarkdown and pandoc

    • Rendering of lessons is greatly simplified
  • I am developing an R package to facilitate translating with the Workbench format called dovetail

sandpaper R package logo

dovetail

  • Each translation is contained within each lesson1

  • Rendering is easily accomplished locally by the translator

  • Plan to have a standard system for translation (e.g., pushing/pulling from transifex)

dovetail

library(dovetail)

# Copy (untranslated) files needed for rendering lesson
create_locale("ja")

# Create PO files ----
create_po_for_locale("ja")

# Edit PO files ----
# for example, with
# usethis::edit_file("po/ja/01-introduction.po")

# Translate md files ----
# translate all (R)md files at once to `./locale/{lang}/`
translate_md_for_locale("ja")

# Build translated lesson ----
sandpaper::build_lesson("locale/ja/")

dovetail

Output of translation

|-- CONTRIBUTING.md             # - Carpentries Rules for Contributions
|-- README.md                   # - Describes lesson
...
|-- po                          # - NEW, contains PO files for translation
|   `-- ja/                     
|       |-- 00-introducition.po 
|       |-- CONTRIBUTING.po     
|       |...                    
|-- locale                      # - NEW, contains translated files
|   `-- ja/                     
|       |-- CONTRIBUTING.md     # - NEW, translated markdown files
|       |-- site/               # - NEW, translated, rendered site
|           |-- built/          
|           |...               

dovetail design philosophy

  • Make it easier for maintainer to maintain (i18n)
    • Not dependent on one person maintaining one central repo
  • Make it easier for translators to translate (l10n)
    • Requires minimal technical knowlege to participate (don’t need git)

Promote participation in Carpentries by translating! 😀 👐